356
26
Medicine and Disease
Much of the business of bioinformatics concerns the correlation of phenotype with
genotype, with the transcriptome and proteome acting as intermediaries. 4 Bioin-
formatics gives an unprecedented ability to scrutinize the intermediate levels and
establish correlations far more extensively and in far more detail than was ever pos-
sible before the advent of high-throughput sequencing and other omics technologies,
along with the computing power enabling the handling (including storage) and anal-
ysis of huge datasets. This ability is revolutionizing medicine. In this spirit, one may
represent the human being as a gigantic table of correlations, comprising successive
columns of genes and genetic variation, environmental conditions—the exposome,
protein levels, and physiological states and interactions. 5
Medicine is especially concerned with investigating physiological disorders, and
the techniques of bioinformatics allows one to establish correlations between those
disorders and variations in the genome and proteome of a patient; 6 medical appli-
cations of bioinformatics are often concerned with the investigation of deleterious
genetic variation and with abnormal protein expression patterns.
More and more data on the genotype of individuals are being gathered. Millions of
single-nucleotide polymorphisms (SNPs) are now documented, and studies involv-
ing the genotyping of hundreds of SNPs in thousands of people are now feasible. 7
As pointed out earlier (Sect. 14.4.3), most of the genetic variability across human
populations can be accounted for by SNPs, and most of the SNP variation can be
grouped into a small number of haplotypes. 8 This growing database might be useful
for elucidating the genetic basis of disease, or susceptibility to disease, and hence
preventive treatment for those screened routinely. This does, however, raise the eth-
ical difficulties associated with prevention, which is not properly part of medicine
(Ramsden 2021). The use of genetic information is further discussed in Sect. 26.3.
The wish to develop genetic screening implies a need for a much more rapid and
inexpensive way of screening for mutations than is possible with genome sequencing.
The classic method is to digest the gene with restriction enzymes and analyse the
fragments separated chromatographically using Southern blotting (see footnote 2 in
Chap. 18). Although direct genotyping with allele-specific hybridization is possible
in simple genomes (e.g., yeast), the complexity of the human genome renders this
4 Indeed, one could view the organism as a gigantic hidden Markov model (Sect. 17.5.2), in which
the gene controls switching between physiological states via protein expression. Unlike the simpler
models considered earlier, here the outputs could intervene in hidden layers.
5 Since the physiological column includes entries for neurophysiological states, it might be tempt-
ing to continue the table by adding a column for the conscious experiences corresponding to the
physiological and other entries. One must be careful to note, however, that conscious experience
is in a different category from the entries in the columns that precede it (Ramsden 2001). Hence,
correlation cannot be taken to imply identity (in the same way, a quadratic equation with two roots
derived by a piece of electronic hardware is embodied in the hardware, but it makes no sense to say
that the hardware has two roots, despite the fact that those roots have well-defined correlates in the
electronic states of the circuit components).
6 Mossink et al. (2012).
7 These data can also be used to infer population structures (Jakobsson et al. 2008).
8 These investigations are closely related to those of linkage disequilibrium (nonrandom association
between alleles at different loci).